Statistically-constrained shallow text marking: techniques, evaluation paradigm and results
نویسندگان
چکیده
We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson’s r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson’s r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended.
منابع مشابه
Controlling Gender Equality with Shallow NLP Techniques
This paper introduces the “Gendercheck Editor”, a tool to check German texts for gender discriminatory formulations. It relays on shallow rule-based techniques as used in the Controlled Language Authoring Technology (CLAT). The paper outlines major sources of gender imbalances in German texts. It gives a background on the underlying CLAT technology and describes the marking and annotation strat...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملAnnotating and Automatically Tagging Constructions of Causal Language
One popular paradigm for such tasks is shallow semantic parsing—marking relations and their arguments in text. Efforts to date have focused on individual assertions expressed by individual words. While fruitful, this approach falters on semantic relationships that can be expressed by more complex linguistic patterns than words. It also struggles when multiple meanings are entangled in the same ...
متن کاملEvaluation of seasonal variability in surface water quality of Shallow Valley Lake, Kashmir, India, using multivariate statistical techniques
Seasonal variation in water quality of Anchar Lake was evaluated using multivariate statistical techniques- principal component analysis (PCA) and cluster analysis (CA). Water quality data collected during 4 seasons was analyzed for 13 parameters. ANOVA showed significant variation in pH (F3 = 10.86, P < 0.05), temperature (F3 = 65, P
متن کاملAutomatic Short Answer Marking
Our aim is to investigate computational linguistics (CL) techniques in marking short free text responses automatically. Successful automatic marking of free text answers would seem to presuppose an advanced level of performance in automated natural language understanding. However, recent advances in CL techniques have opened up the possibility of being able to automate the marking of free text ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007